In this project, I will be working on time series prediction using neural network architectures, focusing on both classification and estimation tasks using a modified "Air Quality" dataset. The dataset contains hourly averaged responses from chemical sensors embedded in an air quality device, with recorded data from March 2004 to February 2005.
Develop a neural network to predict if the concentration of Carbon Monoxide (CO) exceeds the mean of CO(GT) values. Perform binary classification to categorize instances as above or below the threshold. Handle missing values in the dataset.
Develop a neural network to predict the concentration of Nitrogen Oxides (NOx) based on other air quality features. Estimate a continuous numerical value using regression techniques. Handle missing values in the dataset.
# z5499630 Boyang, Peng
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
import seaborn as sns
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
# Read data from the given xlsx file
AQ_file = 'AirQualityUCI _ Students.xlsx'
AQ_data = pd.read_excel(AQ_file)
# Calculate the minimum and maximum values of each column and create a DataFrame to display the results
min_max_df = pd.DataFrame({'Min': AQ_data.min(), 'Max': AQ_data.max()})
print(min_max_df)
# min_max_df
Min Max Date 2004-03-10 00:00:00 2005-04-01 00:00:00 Time 00:00:00 23:00:00 CO(GT) -200.0 11.9 PT08.S1(CO) -200.0 2007.75 NMHC(GT) -200 1189 C6H6(GT) -200.0 63.741476 PT08.S2(NMHC) -200.0 2214.0 NOx(GT) -200.0 1479.0 PT08.S3(NOx) -200.0 2682.75 NO2(GT) -200.0 339.7 PT08.S4(NO2) -200.0 2775.0 PT08.S5(O3) -200.0 2522.75 T -200.0 44.6 RH -200.0 87.174999 AH -200.0 2.231036
# Convert Date column to datetime
AQ_data['Date'] = pd.to_datetime(AQ_data['Date'])
# Set Date as the index
AQ_data.set_index('Date', inplace=True)
# Drop the Time column
AQ_data.drop(columns=['Time'], inplace=True)
# Calculate the number of rows needed to fit all subplots in 2 columns for compact view
nrows = (len(AQ_data.columns) + 1) // 2
fig, axes = plt.subplots(nrows=nrows, ncols=2, figsize=(20, 20))
axes = axes.flatten()
for i, column in enumerate(AQ_data.columns):
axes[i].plot(AQ_data.index, AQ_data[column], color='purple')
axes[i].set_title(column)
for j in range(i+1, len(axes)):
fig.delaxes(axes[j])
plt.tight_layout()
plt.show()
# Checking missing value
missing_value_tally = (AQ_data == -200).sum()
print("Tally of missing value (-200):")
print(missing_value_tally)
# Replace the missing value with NaN
data_replaced = AQ_data.replace(-200, np.nan)
# Handle missing data using linear interpolation
data_interpolated = data_replaced.interpolate(method='linear', limit_direction='forward', axis=0)
# Plot data after interpolation
for column in data_interpolated.columns:
plt.figure(figsize=(26, 6))
data_interpolated[column].plot(title=f'{column} (After Interpolation)', color='purple')
plt.xlabel('Date')
plt.ylabel(column)
plt.show()
Tally of missing value (-200): CO(GT) 1585 PT08.S1(CO) 366 NMHC(GT) 7525 C6H6(GT) 366 PT08.S2(NMHC) 366 NOx(GT) 1573 PT08.S3(NOx) 366 NO2(GT) 1576 PT08.S4(NO2) 366 PT08.S5(O3) 366 T 366 RH 366 AH 366 dtype: int64
In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data.
It is defined as the difference between the 75th and 25th percentiles of the data.
To calculate the IQR, the data set is divided into quartiles, or four rank-ordered even parts via linear interpolation.
These quartiles are denoted by Q1 (also called the lower quartile), Q2 (the median), and Q3 (also called the upper quartile). The lower quartile corresponds with the 25th percentile and the upper quartile corresponds with the 75th percentile, so IQR = Q3 − Q1.
The IQR is an example of a trimmed estimator, defined as the 25% trimmed range, which enhances the accuracy of dataset statistics by dropping lower contribution, outlying points.[1]
# Detect outliers using IQR
Q1 = data_interpolated.quantile(0.25)
Q3 = data_interpolated.quantile(0.75)
IQR = Q3 - Q1
# Define outliers as points outside 1.5*IQR range
outliers_lower_bound = Q1 - 1.5 * IQR
outliers_upper_bound = Q3 + 1.5 * IQR
# Logging all the outliers
outliers = (data_interpolated < outliers_lower_bound) | (data_interpolated > outliers_upper_bound)
# Plot data with outliers marked
for column in data_interpolated.columns:
plt.figure(figsize=(20, 4))
plt.plot(data_interpolated[column], label=column, color='purple')
plt.plot(data_interpolated[column][outliers[column]], 'r*', label='Outliers')
plt.title(f'{column} (With Outliers Marked)')
plt.xlabel('Date')
plt.ylabel(column)
plt.legend()
plt.show()
# Replace outliers with NaN
data_interpolated[outliers] = np.nan
# Fill NaN values resulted from outliers detection using linear interpolation
data_cleaned = data_interpolated.interpolate(method='linear', limit_direction='forward', axis=0)
# data_cleaned = data_interpolated
# Calculate the mean value for CO(GT), excluding missing values
co_mean = data_cleaned['CO(GT)'].mean()
# Create the binary target variable CO_Target in the data_cleaned DataFrame
# It assigns a value of 1 to the CO_Target column
# if the corresponding value in the CO(GT) column is greater than the calculated mean (co_mean), and 0 otherwise
data_cleaned['CO_Target'] = (data_cleaned['CO(GT)'] > co_mean).astype(int)
# Compute and plot the correlation matrix
plt.figure(figsize=(8, 8))
corr_matrix = data_cleaned.drop(columns=['NMHC(GT)', 'CO_Target']).corr()
sns.heatmap(corr_matrix, annot=True, cmap='Greens')
plt.title('Feature Correlation Matrix')
plt.show()
Feture engineering:
Rolling statistics here are a way to calculate statistics over a moving window of a fixed size across a time series data. This technique is particularly useful for capturing temporal trends and patterns over time.
Here, it calculates two new rolling statistics with a window size of 24.
Rolling Mean: it calculates the mean of the current and previous 23 values for each entry in the column. The result is a new column named f'{col}_rolling_mean'.
Rolling Standard Deviation: it calculates the standard deviation of the current and previous 23 values for each entry in the column. The result is a new column named f'{col}_rolling_std'.
The dropped features are:
'CO(GT)' and 'CO_Target': These are the target variable and its binary representation, which should not be part of the features.
'NMHC(GT)': Dropped due to massive missing values.
'T', 'RH', 'AH': These features were dropped based on their correlation with the target variable and other features.
# Feature Engineering
# Rolling statistics
for col in data_cleaned.columns:
if col not in ['CO(GT)', 'CO_Target', 'NMHC(GT)', 'T', 'RH', 'AH']:
data_cleaned[f'{col}_rolling_mean'] = data_cleaned[col].rolling(window=24).mean()
data_cleaned[f'{col}_rolling_std'] = data_cleaned[col].rolling(window=24).std()
# Drop rows with NaN values created by rolling statistics
data_cleaned.dropna(inplace=True)
# Prepare the features (X) and target (y)
# X = data_cleaned.drop(columns=['CO(GT)', 'CO_Target', 'NMHC(GT)', 'PT08.S4(NO2)', 'T', 'RH', 'AH'])
X = data_cleaned.drop(columns=['CO(GT)', 'CO_Target', 'NMHC(GT)', 'T', 'RH', 'AH'])
y = data_cleaned['CO_Target']
# Split the data into training and combined validation/test sets
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3)
# Further split the combined set into validation and test sets
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)
# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)
print(f"Mean value for CO(GT): {co_mean}")
print(f"Total data size: {X.shape}")
print(f"Training data size: {X_train.shape}")
print(f"Validation data size: {X_val.shape}")
print(f"Testing data size: {X_test.shape}")
Mean value for CO(GT): 2.090075376884422 Total data size: (7308, 24) Training data size: (5115, 24) Validation data size: (1096, 24) Testing data size: (1097, 24)
Dense Layer 1: 64 units, ReLU activation
Dropout Layer 1: 0.3 dropout rate
Dense Layer 2: 16 units, ReLU activation
Dropout Layer 2: 0.2 dropout rate
Output Layer: 1 unit, Sigmoid activation
Higher Dropout Rate: More neurons dropped, stronger regularization, higher risk of underfitting.
Lower Dropout Rate: Fewer neurons dropped, weaker regularization, higher risk of overfitting.
0.001
By adding a penalty for large weights, L2 regularization helps to prevent the model from fitting the training data too closely, which can lead to better generalization on unseen data.
Loss Function: binary_crossentropy
This loss function is used for binary classification problems, where the goal is to predict one of two possible outcomes.
Adam: Adam (Adaptive Moment Estimation) optimizer is used for both tasks.
Combines the advantages of two other extensions of stochastic gradient descent. Specifically, it uses adaptive learning rates and momentum.
lr = 0.0012 Controls the step size during the optimization process. A smaller learning rate can lead to more precise convergence but may require more epochs to train.
64
The number of training samples used in one forward and backward pass. A smaller batch size requires less memory and provides more updates to the model weights, while a larger batch size provides a more accurate estimate of the gradient but requires more memory.
150
The number of times the entire training dataset is passed forward and backward through the neural network. More epochs can lead to better training but also increase the risk of overfitting.
Verify the Training and Validation Accuracy, If the training accuracy continues to increase while the validation accuracy starts to plateau or decrease, it indicates overfitting.
# Build the neural network
classification_model = Sequential([
Input(shape=(X_train_scaled.shape[1],)),
Dense(64, activation='relu', kernel_regularizer=l2(0.001)),
Dropout(0.3),
Dense(16, activation='relu', kernel_regularizer=l2(0.001)),
Dropout(0.2),
Dense(1, activation='sigmoid')
])
# Compile the model
classification_model.compile(optimizer=Adam(learning_rate=0.0012), loss='binary_crossentropy', metrics=['accuracy'])
# Show the maximal number of parameters
classification_model.summary()
# Train the neural network
history = classification_model.fit(X_train_scaled, y_train, epochs=150, batch_size=64, validation_data=(X_val_scaled, y_val))
# Plot combined training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
# Plot accuracy
ax1.plot(history.history['accuracy'], label='Training Accuracy')
ax1.plot(history.history['val_accuracy'], label='Validation Accuracy')
ax1.set_title('Model Accuracy')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Accuracy')
ax1.legend()
# Plot loss
ax2.plot(history.history['loss'], label='Training Loss')
ax2.plot(history.history['val_loss'], label='Validation Loss')
ax2.set_title('Model Loss')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Loss')
ax2.legend()
plt.tight_layout()
plt.show()
# Save the classification model
classification_model.save('classification_model.keras')
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 64) │ 1,600 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout (Dropout) │ (None, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 16) │ 1,040 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_1 (Dropout) │ (None, 16) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 1) │ 17 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 2,657 (10.38 KB)
Trainable params: 2,657 (10.38 KB)
Non-trainable params: 0 (0.00 B)
Epoch 1/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.7063 - loss: 0.5959 - val_accuracy: 0.8814 - val_loss: 0.3711 Epoch 2/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 555us/step - accuracy: 0.8878 - loss: 0.3778 - val_accuracy: 0.8823 - val_loss: 0.3376 Epoch 3/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 555us/step - accuracy: 0.8915 - loss: 0.3356 - val_accuracy: 0.8942 - val_loss: 0.3117 Epoch 4/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.8960 - loss: 0.3173 - val_accuracy: 0.9042 - val_loss: 0.2966 Epoch 5/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.9026 - loss: 0.3043 - val_accuracy: 0.9042 - val_loss: 0.2880 Epoch 6/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 569us/step - accuracy: 0.8968 - loss: 0.3144 - val_accuracy: 0.9106 - val_loss: 0.2803 Epoch 7/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 624us/step - accuracy: 0.9049 - loss: 0.3012 - val_accuracy: 0.9088 - val_loss: 0.2759 Epoch 8/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 626us/step - accuracy: 0.9002 - loss: 0.2912 - val_accuracy: 0.9106 - val_loss: 0.2695 Epoch 9/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 854us/step - accuracy: 0.9086 - loss: 0.2735 - val_accuracy: 0.9097 - val_loss: 0.2685 Epoch 10/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 576us/step - accuracy: 0.8983 - loss: 0.3041 - val_accuracy: 0.9088 - val_loss: 0.2663 Epoch 11/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 561us/step - accuracy: 0.9029 - loss: 0.2805 - val_accuracy: 0.9179 - val_loss: 0.2609 Epoch 12/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 531us/step - accuracy: 0.9013 - loss: 0.2798 - val_accuracy: 0.9161 - val_loss: 0.2584 Epoch 13/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 553us/step - accuracy: 0.9070 - loss: 0.2711 - val_accuracy: 0.9161 - val_loss: 0.2572 Epoch 14/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 568us/step - accuracy: 0.9118 - loss: 0.2716 - val_accuracy: 0.9161 - val_loss: 0.2540 Epoch 15/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 550us/step - accuracy: 0.9090 - loss: 0.2620 - val_accuracy: 0.9197 - val_loss: 0.2518 Epoch 16/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 565us/step - accuracy: 0.9101 - loss: 0.2632 - val_accuracy: 0.9151 - val_loss: 0.2491 Epoch 17/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 562us/step - accuracy: 0.9115 - loss: 0.2517 - val_accuracy: 0.9142 - val_loss: 0.2504 Epoch 18/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 569us/step - accuracy: 0.9123 - loss: 0.2491 - val_accuracy: 0.9124 - val_loss: 0.2483 Epoch 19/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 559us/step - accuracy: 0.9104 - loss: 0.2664 - val_accuracy: 0.9179 - val_loss: 0.2438 Epoch 20/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 556us/step - accuracy: 0.9081 - loss: 0.2627 - val_accuracy: 0.9170 - val_loss: 0.2424 Epoch 21/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 566us/step - accuracy: 0.9129 - loss: 0.2502 - val_accuracy: 0.9133 - val_loss: 0.2433 Epoch 22/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 544us/step - accuracy: 0.9047 - loss: 0.2617 - val_accuracy: 0.9170 - val_loss: 0.2408 Epoch 23/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - accuracy: 0.9122 - loss: 0.2593 - val_accuracy: 0.9151 - val_loss: 0.2382 Epoch 24/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 559us/step - accuracy: 0.9087 - loss: 0.2472 - val_accuracy: 0.9161 - val_loss: 0.2407 Epoch 25/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 563us/step - accuracy: 0.9114 - loss: 0.2444 - val_accuracy: 0.9206 - val_loss: 0.2367 Epoch 26/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 550us/step - accuracy: 0.9051 - loss: 0.2610 - val_accuracy: 0.9188 - val_loss: 0.2379 Epoch 27/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 562us/step - accuracy: 0.9110 - loss: 0.2463 - val_accuracy: 0.9161 - val_loss: 0.2341 Epoch 28/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.9072 - loss: 0.2480 - val_accuracy: 0.9243 - val_loss: 0.2356 Epoch 29/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 550us/step - accuracy: 0.9216 - loss: 0.2282 - val_accuracy: 0.9224 - val_loss: 0.2309 Epoch 30/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 791us/step - accuracy: 0.9167 - loss: 0.2413 - val_accuracy: 0.9170 - val_loss: 0.2313 Epoch 31/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 584us/step - accuracy: 0.9134 - loss: 0.2322 - val_accuracy: 0.9069 - val_loss: 0.2377 Epoch 32/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 535us/step - accuracy: 0.9159 - loss: 0.2366 - val_accuracy: 0.9179 - val_loss: 0.2301 Epoch 33/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 544us/step - accuracy: 0.9187 - loss: 0.2319 - val_accuracy: 0.9188 - val_loss: 0.2337 Epoch 34/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 577us/step - accuracy: 0.9077 - loss: 0.2498 - val_accuracy: 0.9206 - val_loss: 0.2313 Epoch 35/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 565us/step - accuracy: 0.9201 - loss: 0.2337 - val_accuracy: 0.9106 - val_loss: 0.2342 Epoch 36/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.9127 - loss: 0.2439 - val_accuracy: 0.9188 - val_loss: 0.2248 Epoch 37/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 517us/step - accuracy: 0.9245 - loss: 0.2183 - val_accuracy: 0.9188 - val_loss: 0.2261 Epoch 38/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 521us/step - accuracy: 0.9198 - loss: 0.2274 - val_accuracy: 0.9206 - val_loss: 0.2254 Epoch 39/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9174 - loss: 0.2278 - val_accuracy: 0.9234 - val_loss: 0.2212 Epoch 40/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 519us/step - accuracy: 0.9180 - loss: 0.2248 - val_accuracy: 0.9224 - val_loss: 0.2239 Epoch 41/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - accuracy: 0.9120 - loss: 0.2360 - val_accuracy: 0.9279 - val_loss: 0.2250 Epoch 42/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.9164 - loss: 0.2290 - val_accuracy: 0.9206 - val_loss: 0.2214 Epoch 43/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 597us/step - accuracy: 0.9152 - loss: 0.2318 - val_accuracy: 0.9161 - val_loss: 0.2231 Epoch 44/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 865us/step - accuracy: 0.9184 - loss: 0.2218 - val_accuracy: 0.9188 - val_loss: 0.2211 Epoch 45/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 612us/step - accuracy: 0.9191 - loss: 0.2289 - val_accuracy: 0.9206 - val_loss: 0.2181 Epoch 46/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 560us/step - accuracy: 0.9139 - loss: 0.2270 - val_accuracy: 0.9197 - val_loss: 0.2201 Epoch 47/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 588us/step - accuracy: 0.9252 - loss: 0.2129 - val_accuracy: 0.9188 - val_loss: 0.2237 Epoch 48/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 613us/step - accuracy: 0.9186 - loss: 0.2260 - val_accuracy: 0.9188 - val_loss: 0.2147 Epoch 49/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 588us/step - accuracy: 0.9250 - loss: 0.2180 - val_accuracy: 0.9234 - val_loss: 0.2185 Epoch 50/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 547us/step - accuracy: 0.9147 - loss: 0.2261 - val_accuracy: 0.9243 - val_loss: 0.2151 Epoch 51/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 523us/step - accuracy: 0.9213 - loss: 0.2194 - val_accuracy: 0.9188 - val_loss: 0.2194 Epoch 52/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.9197 - loss: 0.2150 - val_accuracy: 0.9206 - val_loss: 0.2177 Epoch 53/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 517us/step - accuracy: 0.9227 - loss: 0.2101 - val_accuracy: 0.9124 - val_loss: 0.2303 Epoch 54/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - accuracy: 0.9166 - loss: 0.2186 - val_accuracy: 0.9142 - val_loss: 0.2192 Epoch 55/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 559us/step - accuracy: 0.9221 - loss: 0.2168 - val_accuracy: 0.9170 - val_loss: 0.2220 Epoch 56/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 569us/step - accuracy: 0.9256 - loss: 0.2220 - val_accuracy: 0.9206 - val_loss: 0.2169 Epoch 57/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 578us/step - accuracy: 0.9267 - loss: 0.2135 - val_accuracy: 0.9206 - val_loss: 0.2179 Epoch 58/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 577us/step - accuracy: 0.9272 - loss: 0.2088 - val_accuracy: 0.9161 - val_loss: 0.2207 Epoch 59/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 912us/step - accuracy: 0.9148 - loss: 0.2252 - val_accuracy: 0.9243 - val_loss: 0.2143 Epoch 60/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 621us/step - accuracy: 0.9238 - loss: 0.2144 - val_accuracy: 0.9252 - val_loss: 0.2182 Epoch 61/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 626us/step - accuracy: 0.9263 - loss: 0.2157 - val_accuracy: 0.9151 - val_loss: 0.2162 Epoch 62/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 576us/step - accuracy: 0.9225 - loss: 0.2102 - val_accuracy: 0.9243 - val_loss: 0.2134 Epoch 63/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 580us/step - accuracy: 0.9275 - loss: 0.2119 - val_accuracy: 0.9279 - val_loss: 0.2074 Epoch 64/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 622us/step - accuracy: 0.9207 - loss: 0.2083 - val_accuracy: 0.9270 - val_loss: 0.2094 Epoch 65/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 596us/step - accuracy: 0.9142 - loss: 0.2214 - val_accuracy: 0.9243 - val_loss: 0.2091 Epoch 66/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 610us/step - accuracy: 0.9315 - loss: 0.2028 - val_accuracy: 0.9261 - val_loss: 0.2133 Epoch 67/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 578us/step - accuracy: 0.9257 - loss: 0.2149 - val_accuracy: 0.9297 - val_loss: 0.2069 Epoch 68/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 543us/step - accuracy: 0.9250 - loss: 0.2164 - val_accuracy: 0.9288 - val_loss: 0.2046 Epoch 69/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.9250 - loss: 0.2114 - val_accuracy: 0.9316 - val_loss: 0.2038 Epoch 70/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - accuracy: 0.9244 - loss: 0.2092 - val_accuracy: 0.9161 - val_loss: 0.2117 Epoch 71/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.9202 - loss: 0.2127 - val_accuracy: 0.9243 - val_loss: 0.2086 Epoch 72/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 880us/step - accuracy: 0.9162 - loss: 0.2145 - val_accuracy: 0.9197 - val_loss: 0.2088 Epoch 73/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 555us/step - accuracy: 0.9270 - loss: 0.2061 - val_accuracy: 0.9307 - val_loss: 0.2054 Epoch 74/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 629us/step - accuracy: 0.9244 - loss: 0.2107 - val_accuracy: 0.9243 - val_loss: 0.2096 Epoch 75/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 633us/step - accuracy: 0.9225 - loss: 0.2117 - val_accuracy: 0.9297 - val_loss: 0.2057 Epoch 76/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 628us/step - accuracy: 0.9269 - loss: 0.2063 - val_accuracy: 0.9270 - val_loss: 0.2076 Epoch 77/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 608us/step - accuracy: 0.9193 - loss: 0.2265 - val_accuracy: 0.9252 - val_loss: 0.2084 Epoch 78/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 637us/step - accuracy: 0.9229 - loss: 0.2293 - val_accuracy: 0.9252 - val_loss: 0.2117 Epoch 79/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 606us/step - accuracy: 0.9229 - loss: 0.2112 - val_accuracy: 0.9252 - val_loss: 0.2031 Epoch 80/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 603us/step - accuracy: 0.9342 - loss: 0.1959 - val_accuracy: 0.9261 - val_loss: 0.2044 Epoch 81/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 605us/step - accuracy: 0.9225 - loss: 0.2097 - val_accuracy: 0.9234 - val_loss: 0.2123 Epoch 82/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 619us/step - accuracy: 0.9261 - loss: 0.1990 - val_accuracy: 0.9234 - val_loss: 0.2104 Epoch 83/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 606us/step - accuracy: 0.9288 - loss: 0.2095 - val_accuracy: 0.9279 - val_loss: 0.2022 Epoch 84/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 572us/step - accuracy: 0.9280 - loss: 0.2017 - val_accuracy: 0.9307 - val_loss: 0.2032 Epoch 85/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 845us/step - accuracy: 0.9250 - loss: 0.2074 - val_accuracy: 0.9307 - val_loss: 0.2064 Epoch 86/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 563us/step - accuracy: 0.9227 - loss: 0.2138 - val_accuracy: 0.9215 - val_loss: 0.2073 Epoch 87/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 553us/step - accuracy: 0.9327 - loss: 0.1896 - val_accuracy: 0.9316 - val_loss: 0.2028 Epoch 88/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 542us/step - accuracy: 0.9325 - loss: 0.1935 - val_accuracy: 0.9279 - val_loss: 0.2050 Epoch 89/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 553us/step - accuracy: 0.9301 - loss: 0.2014 - val_accuracy: 0.9288 - val_loss: 0.2047 Epoch 90/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 572us/step - accuracy: 0.9348 - loss: 0.1859 - val_accuracy: 0.9297 - val_loss: 0.1985 Epoch 91/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 571us/step - accuracy: 0.9284 - loss: 0.2059 - val_accuracy: 0.9297 - val_loss: 0.1985 Epoch 92/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 554us/step - accuracy: 0.9342 - loss: 0.1925 - val_accuracy: 0.9389 - val_loss: 0.1966 Epoch 93/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 548us/step - accuracy: 0.9336 - loss: 0.1893 - val_accuracy: 0.9325 - val_loss: 0.2006 Epoch 94/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - accuracy: 0.9362 - loss: 0.1856 - val_accuracy: 0.9279 - val_loss: 0.1996 Epoch 95/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 547us/step - accuracy: 0.9281 - loss: 0.2049 - val_accuracy: 0.9398 - val_loss: 0.1933 Epoch 96/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 564us/step - accuracy: 0.9303 - loss: 0.1968 - val_accuracy: 0.9261 - val_loss: 0.2055 Epoch 97/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 526us/step - accuracy: 0.9295 - loss: 0.2032 - val_accuracy: 0.9343 - val_loss: 0.1998 Epoch 98/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 506us/step - accuracy: 0.9338 - loss: 0.1912 - val_accuracy: 0.9334 - val_loss: 0.2017 Epoch 99/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - accuracy: 0.9251 - loss: 0.2100 - val_accuracy: 0.9288 - val_loss: 0.2063 Epoch 100/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 502us/step - accuracy: 0.9317 - loss: 0.2004 - val_accuracy: 0.9325 - val_loss: 0.1977 Epoch 101/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 822us/step - accuracy: 0.9333 - loss: 0.1906 - val_accuracy: 0.9334 - val_loss: 0.1976 Epoch 102/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 550us/step - accuracy: 0.9305 - loss: 0.2007 - val_accuracy: 0.9325 - val_loss: 0.1975 Epoch 103/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 588us/step - accuracy: 0.9354 - loss: 0.1943 - val_accuracy: 0.9370 - val_loss: 0.1934 Epoch 104/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 610us/step - accuracy: 0.9341 - loss: 0.1911 - val_accuracy: 0.9361 - val_loss: 0.1908 Epoch 105/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 605us/step - accuracy: 0.9279 - loss: 0.1898 - val_accuracy: 0.9334 - val_loss: 0.1935 Epoch 106/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 548us/step - accuracy: 0.9280 - loss: 0.2021 - val_accuracy: 0.9370 - val_loss: 0.1941 Epoch 107/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 575us/step - accuracy: 0.9385 - loss: 0.1872 - val_accuracy: 0.9380 - val_loss: 0.1919 Epoch 108/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 593us/step - accuracy: 0.9330 - loss: 0.1865 - val_accuracy: 0.9370 - val_loss: 0.1986 Epoch 109/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 577us/step - accuracy: 0.9381 - loss: 0.1903 - val_accuracy: 0.9316 - val_loss: 0.1966 Epoch 110/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 548us/step - accuracy: 0.9327 - loss: 0.1924 - val_accuracy: 0.9343 - val_loss: 0.1956 Epoch 111/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 530us/step - accuracy: 0.9348 - loss: 0.1873 - val_accuracy: 0.9398 - val_loss: 0.1868 Epoch 112/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 519us/step - accuracy: 0.9309 - loss: 0.1941 - val_accuracy: 0.9370 - val_loss: 0.1908 Epoch 113/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 518us/step - accuracy: 0.9334 - loss: 0.1972 - val_accuracy: 0.9370 - val_loss: 0.1916 Epoch 114/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 763us/step - accuracy: 0.9323 - loss: 0.1961 - val_accuracy: 0.9352 - val_loss: 0.1957 Epoch 115/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 507us/step - accuracy: 0.9354 - loss: 0.1916 - val_accuracy: 0.9407 - val_loss: 0.1858 Epoch 116/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 511us/step - accuracy: 0.9379 - loss: 0.1826 - val_accuracy: 0.9398 - val_loss: 0.1932 Epoch 117/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 509us/step - accuracy: 0.9315 - loss: 0.1956 - val_accuracy: 0.9352 - val_loss: 0.1911 Epoch 118/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 513us/step - accuracy: 0.9375 - loss: 0.1807 - val_accuracy: 0.9307 - val_loss: 0.1932 Epoch 119/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 514us/step - accuracy: 0.9344 - loss: 0.1871 - val_accuracy: 0.9334 - val_loss: 0.1915 Epoch 120/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.9393 - loss: 0.1847 - val_accuracy: 0.9389 - val_loss: 0.1925 Epoch 121/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 510us/step - accuracy: 0.9348 - loss: 0.1891 - val_accuracy: 0.9416 - val_loss: 0.1882 Epoch 122/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - accuracy: 0.9386 - loss: 0.1763 - val_accuracy: 0.9352 - val_loss: 0.1925 Epoch 123/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 513us/step - accuracy: 0.9308 - loss: 0.1908 - val_accuracy: 0.9370 - val_loss: 0.1924 Epoch 124/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 522us/step - accuracy: 0.9287 - loss: 0.1927 - val_accuracy: 0.9380 - val_loss: 0.1941 Epoch 125/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 513us/step - accuracy: 0.9375 - loss: 0.1855 - val_accuracy: 0.9398 - val_loss: 0.1925 Epoch 126/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 512us/step - accuracy: 0.9426 - loss: 0.1866 - val_accuracy: 0.9361 - val_loss: 0.1965 Epoch 127/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 517us/step - accuracy: 0.9366 - loss: 0.1808 - val_accuracy: 0.9425 - val_loss: 0.1861 Epoch 128/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 512us/step - accuracy: 0.9340 - loss: 0.1922 - val_accuracy: 0.9352 - val_loss: 0.1925 Epoch 129/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - accuracy: 0.9344 - loss: 0.1876 - val_accuracy: 0.9407 - val_loss: 0.1865 Epoch 130/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 828us/step - accuracy: 0.9352 - loss: 0.1846 - val_accuracy: 0.9370 - val_loss: 0.1921 Epoch 131/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 545us/step - accuracy: 0.9364 - loss: 0.1888 - val_accuracy: 0.9425 - val_loss: 0.1877 Epoch 132/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 574us/step - accuracy: 0.9432 - loss: 0.1701 - val_accuracy: 0.9343 - val_loss: 0.1889 Epoch 133/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 603us/step - accuracy: 0.9390 - loss: 0.1852 - val_accuracy: 0.9343 - val_loss: 0.1946 Epoch 134/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 608us/step - accuracy: 0.9372 - loss: 0.1897 - val_accuracy: 0.9407 - val_loss: 0.1864 Epoch 135/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 557us/step - accuracy: 0.9461 - loss: 0.1656 - val_accuracy: 0.9380 - val_loss: 0.1866 Epoch 136/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 612us/step - accuracy: 0.9374 - loss: 0.1783 - val_accuracy: 0.9370 - val_loss: 0.1888 Epoch 137/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 603us/step - accuracy: 0.9376 - loss: 0.1740 - val_accuracy: 0.9407 - val_loss: 0.1914 Epoch 138/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 586us/step - accuracy: 0.9361 - loss: 0.1863 - val_accuracy: 0.9370 - val_loss: 0.1859 Epoch 139/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 576us/step - accuracy: 0.9370 - loss: 0.1883 - val_accuracy: 0.9398 - val_loss: 0.1856 Epoch 140/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 586us/step - accuracy: 0.9401 - loss: 0.1795 - val_accuracy: 0.9407 - val_loss: 0.1877 Epoch 141/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 785us/step - accuracy: 0.9380 - loss: 0.1809 - val_accuracy: 0.9380 - val_loss: 0.1864 Epoch 142/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 585us/step - accuracy: 0.9385 - loss: 0.1847 - val_accuracy: 0.9389 - val_loss: 0.1922 Epoch 143/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 636us/step - accuracy: 0.9366 - loss: 0.1821 - val_accuracy: 0.9434 - val_loss: 0.1885 Epoch 144/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 645us/step - accuracy: 0.9438 - loss: 0.1677 - val_accuracy: 0.9398 - val_loss: 0.1890 Epoch 145/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 621us/step - accuracy: 0.9409 - loss: 0.1817 - val_accuracy: 0.9370 - val_loss: 0.1968 Epoch 146/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 601us/step - accuracy: 0.9352 - loss: 0.1872 - val_accuracy: 0.9334 - val_loss: 0.1924 Epoch 147/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 563us/step - accuracy: 0.9380 - loss: 0.1814 - val_accuracy: 0.9425 - val_loss: 0.1847 Epoch 148/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 590us/step - accuracy: 0.9387 - loss: 0.1809 - val_accuracy: 0.9398 - val_loss: 0.1873 Epoch 149/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 658us/step - accuracy: 0.9425 - loss: 0.1751 - val_accuracy: 0.9361 - val_loss: 0.1920 Epoch 150/150 80/80 ━━━━━━━━━━━━━━━━━━━━ 0s 978us/step - accuracy: 0.9433 - loss: 0.1739 - val_accuracy: 0.9370 - val_loss: 0.1882
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score
# Evaluate and Predict the model on the test set
# Threshold has been set to 0.5 since this is a binary classification
test_loss, test_accuracy = classification_model.evaluate(X_test_scaled, y_test)
y_test_pred = (classification_model.predict(X_test_scaled) > 0.5).astype(int)
# Generate the confusion matrix
conf_matrix = confusion_matrix(y_test, y_test_pred)
# Extract TP, FP, TN, FN from the confusion matrix
tn, fp, fn, tp = conf_matrix.ravel()
# Calculate accuracy
accuracy = accuracy_score(y_test, y_test_pred)
# Calculate precision
precision = precision_score(y_test, y_test_pred)
# Number of samples
n_samples = len(y_test)
# Display the results in a table
classification_results = pd.DataFrame(
{
"Metric": [
"True Positives (TP)",
"False Positives (FP)",
"True Negatives (TN)",
"False Negatives (FN)",
"Accuracy",
"Precision",
],
"Value": [tp, fp, tn, fn, f"{accuracy * 100:.4f}%", f"{precision * 100:.4f}%"],
}
)
print(classification_results)
fig, ax = plt.subplots(figsize=(4, 1))
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
ax.set_frame_on(False)
table = ax.table(
cellText=[[f"{accuracy * 100:.4f}%", f"{precision * 100:.4f}%", n_samples]],
colLabels=["Accuracy", "Precision", "#Samples"],
cellLoc="center",
loc="center",
)
plt.title(
"Accuracy and precision for the test data for the classification task", fontsize=12
)
plt.show()
# Plot the confusion matrix
plt.figure(figsize=(5, 4))
sns.heatmap(conf_matrix, annot=True, fmt="d")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()
35/35 ━━━━━━━━━━━━━━━━━━━━ 0s 330us/step - accuracy: 0.9390 - loss: 0.1909 35/35 ━━━━━━━━━━━━━━━━━━━━ 0s 598us/step Metric Value 0 True Positives (TP) 453 1 False Positives (FP) 40 2 True Negatives (TN) 573 3 False Negatives (FN) 31 4 Accuracy 93.5278% 5 Precision 91.8864%
# Code Dilimiter
pass
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import tensorflow as tf
import seaborn as sns
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
# Read data from the given xlsx file
AQ_file = 'AirQualityUCI _ Students.xlsx'
AQ_data = pd.read_excel(AQ_file)
# Calculate the minimum and maximum values of each column and create a DataFrame to display the results
min_max_df = pd.DataFrame({'Min': AQ_data.min(), 'Max': AQ_data.max()})
print(min_max_df)
# min_max_df
# Convert Date column to datetime
AQ_data['Date'] = pd.to_datetime(AQ_data['Date'])
# Set Date as the index
AQ_data.set_index('Date', inplace=True)
# Drop the Time column
AQ_data.drop(columns=['Time'], inplace=True)
# Checking missing value
# Since stated in the Problem context, the missing value has been tagget with -200
# Hence, replace -200 with NaN for missing data
# Count the occurrences of -200 in each column
missing_value_tally = (AQ_data == -200).sum()
print("Tally of missing value (-200):")
print(missing_value_tally)
data_replaced = AQ_data.replace(-200, np.nan)
# Handle missing data using linear interpolation
data_interpolated = data_replaced.interpolate(method='linear', limit_direction='forward', axis=0)
# Detect outliers using IQR
Q1 = data_interpolated.quantile(0.25)
Q3 = data_interpolated.quantile(0.75)
IQR = Q3 - Q1
# Define outliers as points outside 1.5*IQR range
outliers_lower_bound = Q1 - 1.5 * IQR
outliers_upper_bound = Q3 + 1.5 * IQR
# Logging all the outliers
outliers = (data_interpolated < outliers_lower_bound) | (data_interpolated > outliers_upper_bound)
# Plot data with outliers marked
for column in data_interpolated.columns:
plt.figure(figsize=(20, 4))
plt.plot(data_interpolated[column], label=column, color='purple')
plt.plot(data_interpolated[column][outliers[column]], 'r*', label='Outliers')
plt.title(f'{column} (With Outliers Marked)')
plt.xlabel('Date')
plt.ylabel(column)
plt.legend()
plt.show()
# data_cleaned = data_interpolated
# Replace outliers with NaN
data_interpolated[outliers] = np.nan
# Fill NaN values resulted from outliers detection using linear interpolation
data_cleaned = data_interpolated.interpolate(method='linear', limit_direction='forward', axis=0)
# Calculate the mean value for CO(GT), excluding missing values
co_mean = data_cleaned['CO(GT)'].mean()
# Since NMHC(GT) has too many missing values
# the acutal valid values are now outliers
# so we dropped this feature in the future process
# Compute and plot the correlation matrix
plt.figure(figsize=(8, 8))
corr_matrix = data_cleaned.drop(columns='NMHC(GT)').corr()
sns.heatmap(corr_matrix, annot=True, cmap='Greens')
plt.title('Feature Correlation Matrix')
plt.show()
Min Max Date 2004-03-10 00:00:00 2005-04-01 00:00:00 Time 00:00:00 23:00:00 CO(GT) -200.0 11.9 PT08.S1(CO) -200.0 2007.75 NMHC(GT) -200 1189 C6H6(GT) -200.0 63.741476 PT08.S2(NMHC) -200.0 2214.0 NOx(GT) -200.0 1479.0 PT08.S3(NOx) -200.0 2682.75 NO2(GT) -200.0 339.7 PT08.S4(NO2) -200.0 2775.0 PT08.S5(O3) -200.0 2522.75 T -200.0 44.6 RH -200.0 87.174999 AH -200.0 2.231036 Tally of missing value (-200): CO(GT) 1585 PT08.S1(CO) 366 NMHC(GT) 7525 C6H6(GT) 366 PT08.S2(NMHC) 366 NOx(GT) 1573 PT08.S3(NOx) 366 NO2(GT) 1576 PT08.S4(NO2) 366 PT08.S5(O3) 366 T 366 RH 366 AH 366 dtype: int64
The dropped features are:
'NOx(GT)': The target variable.
'NMHC(GT)', 'PT08.S4(NO2)', 'T', 'RH', 'AH': These features were dropped based on their correlation.
# Split the data
n = len(data_cleaned)
train_size = int(n * 0.7)
val_size = int(n * 0.15)
train_data = data_cleaned[:train_size]
val_data = data_cleaned[train_size:train_size + val_size]
test_data = data_cleaned[train_size + val_size:]
# Drop features and create target
X_train = train_data.drop(columns=['NOx(GT)', 'NMHC(GT)', 'PT08.S4(NO2)', 'T', 'RH', 'AH'])
y_train = train_data['NOx(GT)']
X_val = val_data.drop(columns=['NOx(GT)', 'NMHC(GT)', 'PT08.S4(NO2)', 'T', 'RH', 'AH'])
y_val = val_data['NOx(GT)']
X_test = test_data.drop(columns=['NOx(GT)', 'NMHC(GT)', 'PT08.S4(NO2)', 'T', 'RH', 'AH'])
y_test = test_data['NOx(GT)']
# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)
Dense Layer 1: 32 units, ReLU activation
Dropout Layer 1: 0.5 dropout rate
Dense Layer 2: 16 units, ReLU activation
Dropout Layer 2: 0.5 dropout rate
Output Layer: 1 unit, linear activation
Higher Dropout Rate: More neurons dropped, stronger regularization, higher risk of underfitting.
Lower Dropout Rate: Fewer neurons dropped, weaker regularization, higher risk of overfitting.
0.01
By adding a penalty for large weights, L2 regularization helps to prevent the model from fitting the training data too closely, which can lead to better generalization on unseen data.
Loss Function: mean_squared_error
This loss function measures the average squared difference between the actual and predicted values, commonly used in regression problems.
Adam: Adam (Adaptive Moment Estimation) optimizer is used for both tasks.
Combines the advantages of two other extensions of stochastic gradient descent. Specifically, it uses adaptive learning rates and momentum.
lr = 0.001 Controls the step size during the optimization process. A smaller learning rate can lead to more precise convergence but may require more epochs to train.
64
The number of training samples used in one forward and backward pass. A smaller batch size requires less memory and provides more updates to the model weights, while a larger batch size provides a more accurate estimate of the gradient but requires more memory.
100
The number of times the entire training dataset is passed forward and backward through the neural network. More epochs can lead to better training but also increase the risk of overfitting.
# Build the neural network
regression_model = Sequential([
Input(shape=(X_train_scaled.shape[1],)),
Dense(32, activation='relu', kernel_regularizer=l2(0.01)),
Dropout(0.5),
Dense(16, activation='relu', kernel_regularizer=l2(0.01)),
Dropout(0.5),
Dense(1, activation='linear')
])
# Compile the model
regression_model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error', metrics=['mae'])
# Print the model summary
regression_model.summary()
# Train the neural network
history = regression_model.fit(X_train_scaled, y_train, epochs=100, batch_size=64, validation_data=(X_val_scaled, y_val))
# Plot training history
plt.figure(figsize=(8, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Estimation Task - Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
# Save the model
regression_model.save('regression_model.keras')
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense_3 (Dense) │ (None, 32) │ 256 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_2 (Dropout) │ (None, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_4 (Dense) │ (None, 16) │ 528 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_3 (Dropout) │ (None, 16) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_5 (Dense) │ (None, 1) │ 17 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 801 (3.13 KB)
Trainable params: 801 (3.13 KB)
Non-trainable params: 0 (0.00 B)
Epoch 1/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 61553.3867 - mae: 194.9862 - val_loss: 127764.2266 - val_mae: 314.1122 Epoch 2/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 560us/step - loss: 56309.1289 - mae: 184.6492 - val_loss: 109803.7109 - val_mae: 290.4947 Epoch 3/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 548us/step - loss: 46862.9961 - mae: 164.8429 - val_loss: 69975.9375 - val_mae: 231.1930 Epoch 4/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 563us/step - loss: 29353.3848 - mae: 125.5053 - val_loss: 37724.3789 - val_mae: 167.7085 Epoch 5/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 514us/step - loss: 22145.1211 - mae: 107.3039 - val_loss: 28362.0938 - val_mae: 143.8550 Epoch 6/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 484us/step - loss: 19503.3633 - mae: 100.4511 - val_loss: 22714.2012 - val_mae: 127.4551 Epoch 7/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 482us/step - loss: 18629.7969 - mae: 96.9599 - val_loss: 18079.9336 - val_mae: 111.9718 Epoch 8/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 477us/step - loss: 15648.2793 - mae: 88.6912 - val_loss: 13975.5244 - val_mae: 95.6824 Epoch 9/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step - loss: 15373.6426 - mae: 87.2180 - val_loss: 13150.4639 - val_mae: 90.5648 Epoch 10/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 460us/step - loss: 14087.3066 - mae: 83.8858 - val_loss: 11325.8213 - val_mae: 82.0093 Epoch 11/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 462us/step - loss: 14815.3525 - mae: 84.6452 - val_loss: 10954.9736 - val_mae: 79.1072 Epoch 12/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 467us/step - loss: 14528.0449 - mae: 84.2774 - val_loss: 10813.6152 - val_mae: 77.8448 Epoch 13/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 680us/step - loss: 13989.2568 - mae: 82.5738 - val_loss: 10347.5635 - val_mae: 75.1891 Epoch 14/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 461us/step - loss: 14303.2402 - mae: 82.8606 - val_loss: 10207.5859 - val_mae: 74.0061 Epoch 15/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 490us/step - loss: 13901.7510 - mae: 81.8750 - val_loss: 10129.0186 - val_mae: 73.4147 Epoch 16/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - loss: 12838.4453 - mae: 79.3279 - val_loss: 9911.4990 - val_mae: 72.2322 Epoch 17/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 524us/step - loss: 14078.5820 - mae: 81.7651 - val_loss: 9531.6445 - val_mae: 71.0286 Epoch 18/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 529us/step - loss: 12992.3936 - mae: 79.3168 - val_loss: 8792.8701 - val_mae: 67.5859 Epoch 19/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 666us/step - loss: 12996.5869 - mae: 79.9756 - val_loss: 9815.4414 - val_mae: 71.5927 Epoch 20/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 537us/step - loss: 13475.7080 - mae: 80.0985 - val_loss: 9717.7061 - val_mae: 71.5147 Epoch 21/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 801us/step - loss: 13240.8662 - mae: 80.1650 - val_loss: 9857.5801 - val_mae: 71.6041 Epoch 22/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 555us/step - loss: 12675.4688 - mae: 77.8939 - val_loss: 9330.0000 - val_mae: 69.7612 Epoch 23/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 520us/step - loss: 13203.4893 - mae: 80.2110 - val_loss: 9238.9893 - val_mae: 69.2973 Epoch 24/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 525us/step - loss: 12804.1699 - mae: 78.8219 - val_loss: 9826.4824 - val_mae: 71.5120 Epoch 25/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 536us/step - loss: 12784.1260 - mae: 78.8041 - val_loss: 9522.9668 - val_mae: 70.1453 Epoch 26/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 528us/step - loss: 12536.2256 - mae: 78.0159 - val_loss: 10138.7188 - val_mae: 72.4841 Epoch 27/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 478us/step - loss: 13846.3311 - mae: 81.8266 - val_loss: 9967.9609 - val_mae: 71.7836 Epoch 28/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 502us/step - loss: 12799.3789 - mae: 78.8519 - val_loss: 9299.1025 - val_mae: 69.5760 Epoch 29/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 565us/step - loss: 12867.0176 - mae: 78.5506 - val_loss: 9519.6279 - val_mae: 70.2912 Epoch 30/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 547us/step - loss: 12205.2646 - mae: 77.4041 - val_loss: 9608.2207 - val_mae: 70.8540 Epoch 31/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 539us/step - loss: 13099.6113 - mae: 79.1797 - val_loss: 10170.4990 - val_mae: 72.6850 Epoch 32/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 523us/step - loss: 12527.9863 - mae: 77.7044 - val_loss: 9638.5850 - val_mae: 70.5168 Epoch 33/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 507us/step - loss: 13026.8936 - mae: 78.5673 - val_loss: 10066.0840 - val_mae: 72.2247 Epoch 34/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 484us/step - loss: 12670.7031 - mae: 78.3236 - val_loss: 9191.1729 - val_mae: 68.5695 Epoch 35/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 469us/step - loss: 12938.8750 - mae: 78.7875 - val_loss: 9594.1396 - val_mae: 70.0701 Epoch 36/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 476us/step - loss: 11926.0312 - mae: 76.0345 - val_loss: 9134.7383 - val_mae: 68.2888 Epoch 37/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 462us/step - loss: 12761.3818 - mae: 77.7265 - val_loss: 9197.1846 - val_mae: 68.5467 Epoch 38/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 459us/step - loss: 12321.4688 - mae: 76.7919 - val_loss: 9436.5576 - val_mae: 68.9770 Epoch 39/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 460us/step - loss: 13234.0264 - mae: 79.9129 - val_loss: 9799.7871 - val_mae: 70.3504 Epoch 40/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step - loss: 12219.4492 - mae: 76.9154 - val_loss: 9278.0664 - val_mae: 68.6299 Epoch 41/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 502us/step - loss: 12139.0332 - mae: 76.7123 - val_loss: 9597.4004 - val_mae: 70.0064 Epoch 42/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 522us/step - loss: 12638.3135 - mae: 77.5986 - val_loss: 9810.5029 - val_mae: 70.9134 Epoch 43/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 533us/step - loss: 11764.7744 - mae: 75.1614 - val_loss: 10029.1748 - val_mae: 71.7790 Epoch 44/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 523us/step - loss: 12304.1348 - mae: 76.0437 - val_loss: 9332.0098 - val_mae: 69.0334 Epoch 45/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 484us/step - loss: 12494.8711 - mae: 77.0571 - val_loss: 9866.5312 - val_mae: 70.6694 Epoch 46/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step - loss: 12959.2891 - mae: 78.1587 - val_loss: 10299.5127 - val_mae: 72.3165 Epoch 47/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 457us/step - loss: 12753.2432 - mae: 78.0977 - val_loss: 9373.9756 - val_mae: 68.7528 Epoch 48/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step - loss: 12382.4795 - mae: 77.2587 - val_loss: 10643.3428 - val_mae: 73.0531 Epoch 49/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 460us/step - loss: 12321.1436 - mae: 76.4589 - val_loss: 9712.1797 - val_mae: 69.8050 Epoch 50/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 462us/step - loss: 12285.7109 - mae: 77.1056 - val_loss: 10148.6084 - val_mae: 71.3279 Epoch 51/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 468us/step - loss: 12316.6621 - mae: 76.0121 - val_loss: 9230.6982 - val_mae: 67.9730 Epoch 52/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 507us/step - loss: 12293.9346 - mae: 76.4706 - val_loss: 9969.6631 - val_mae: 70.9373 Epoch 53/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 510us/step - loss: 12349.3447 - mae: 76.5777 - val_loss: 9518.0586 - val_mae: 69.0940 Epoch 54/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 482us/step - loss: 11632.0527 - mae: 75.5078 - val_loss: 8882.2598 - val_mae: 66.8397 Epoch 55/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 494us/step - loss: 12574.0801 - mae: 77.3774 - val_loss: 10018.7988 - val_mae: 70.6655 Epoch 56/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 474us/step - loss: 12063.2363 - mae: 75.0671 - val_loss: 9369.6074 - val_mae: 68.3626 Epoch 57/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 471us/step - loss: 12649.2158 - mae: 78.0780 - val_loss: 10100.3994 - val_mae: 71.1313 Epoch 58/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 495us/step - loss: 12976.7744 - mae: 77.7949 - val_loss: 9727.9082 - val_mae: 70.0122 Epoch 59/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 500us/step - loss: 11814.4697 - mae: 75.4951 - val_loss: 9337.5225 - val_mae: 68.6653 Epoch 60/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 504us/step - loss: 12544.5303 - mae: 76.5813 - val_loss: 9820.8877 - val_mae: 69.9761 Epoch 61/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 494us/step - loss: 12075.5342 - mae: 76.3760 - val_loss: 9682.5010 - val_mae: 69.3724 Epoch 62/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 494us/step - loss: 12299.7588 - mae: 75.6792 - val_loss: 9881.9834 - val_mae: 70.1546 Epoch 63/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 541us/step - loss: 12263.9561 - mae: 75.6416 - val_loss: 9501.0566 - val_mae: 68.9036 Epoch 64/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step - loss: 11976.6904 - mae: 75.1093 - val_loss: 8847.6484 - val_mae: 66.2224 Epoch 65/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step - loss: 12309.9092 - mae: 75.5744 - val_loss: 9333.6875 - val_mae: 68.0616 Epoch 66/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step - loss: 12519.4707 - mae: 75.8448 - val_loss: 9826.2295 - val_mae: 69.6995 Epoch 67/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 460us/step - loss: 12433.1846 - mae: 75.7059 - val_loss: 9607.3135 - val_mae: 69.1450 Epoch 68/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step - loss: 13265.8340 - mae: 78.8109 - val_loss: 9492.5986 - val_mae: 68.7849 Epoch 69/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 466us/step - loss: 12460.4414 - mae: 76.3447 - val_loss: 8752.6709 - val_mae: 65.8741 Epoch 70/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 510us/step - loss: 12287.9883 - mae: 76.1733 - val_loss: 9401.2598 - val_mae: 68.2097 Epoch 71/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 502us/step - loss: 12419.7314 - mae: 75.8472 - val_loss: 9565.0693 - val_mae: 68.6724 Epoch 72/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 494us/step - loss: 12032.4414 - mae: 73.8913 - val_loss: 9838.4258 - val_mae: 69.7803 Epoch 73/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 504us/step - loss: 11856.9561 - mae: 74.8298 - val_loss: 9567.6572 - val_mae: 68.7666 Epoch 74/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 484us/step - loss: 12120.1768 - mae: 74.7816 - val_loss: 9154.5293 - val_mae: 67.2800 Epoch 75/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 461us/step - loss: 12075.4922 - mae: 75.5877 - val_loss: 8929.6562 - val_mae: 66.4332 Epoch 76/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 457us/step - loss: 12318.8320 - mae: 75.4315 - val_loss: 9364.3418 - val_mae: 68.0182 Epoch 77/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 458us/step - loss: 11805.5283 - mae: 74.5429 - val_loss: 9179.6592 - val_mae: 67.3328 Epoch 78/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step - loss: 11737.1621 - mae: 73.4922 - val_loss: 8847.7197 - val_mae: 66.0088 Epoch 79/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step - loss: 11996.5840 - mae: 75.3560 - val_loss: 10063.8457 - val_mae: 70.3492 Epoch 80/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 485us/step - loss: 12316.4824 - mae: 75.6154 - val_loss: 9203.7275 - val_mae: 67.1881 Epoch 81/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 510us/step - loss: 11894.1025 - mae: 74.4802 - val_loss: 9036.0244 - val_mae: 66.5145 Epoch 82/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 513us/step - loss: 12576.9785 - mae: 76.5851 - val_loss: 9430.8184 - val_mae: 67.9979 Epoch 83/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 742us/step - loss: 11370.8438 - mae: 73.0973 - val_loss: 8984.6514 - val_mae: 66.3147 Epoch 84/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 506us/step - loss: 12180.0342 - mae: 75.1828 - val_loss: 9437.6885 - val_mae: 68.1385 Epoch 85/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 470us/step - loss: 12081.4775 - mae: 75.1788 - val_loss: 9333.6240 - val_mae: 67.6690 Epoch 86/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 467us/step - loss: 11579.7793 - mae: 74.5770 - val_loss: 9008.8701 - val_mae: 66.5992 Epoch 87/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 465us/step - loss: 11822.9160 - mae: 73.8592 - val_loss: 9261.8086 - val_mae: 67.5264 Epoch 88/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step - loss: 11138.4453 - mae: 72.8087 - val_loss: 8982.2295 - val_mae: 66.2923 Epoch 89/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step - loss: 12033.8340 - mae: 74.0803 - val_loss: 8331.1182 - val_mae: 63.7325 Epoch 90/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 460us/step - loss: 11630.2344 - mae: 73.3748 - val_loss: 9220.3457 - val_mae: 67.0371 Epoch 91/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 496us/step - loss: 12340.1455 - mae: 74.9579 - val_loss: 9401.8184 - val_mae: 67.6634 Epoch 92/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 497us/step - loss: 11881.9814 - mae: 74.3903 - val_loss: 9034.6484 - val_mae: 66.3853 Epoch 93/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 503us/step - loss: 11713.7549 - mae: 73.9831 - val_loss: 9518.6611 - val_mae: 68.0643 Epoch 94/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 503us/step - loss: 12265.2979 - mae: 75.0063 - val_loss: 8875.3232 - val_mae: 65.8139 Epoch 95/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 478us/step - loss: 11397.8691 - mae: 72.6730 - val_loss: 9087.3535 - val_mae: 66.5081 Epoch 96/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 462us/step - loss: 11859.5020 - mae: 74.9775 - val_loss: 9282.6094 - val_mae: 67.1720 Epoch 97/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 462us/step - loss: 11596.9639 - mae: 73.5725 - val_loss: 9373.8594 - val_mae: 67.6800 Epoch 98/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 463us/step - loss: 11811.4082 - mae: 74.7630 - val_loss: 9583.4844 - val_mae: 68.3340 Epoch 99/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 456us/step - loss: 11906.4355 - mae: 75.0394 - val_loss: 8935.8076 - val_mae: 66.1234 Epoch 100/100 92/92 ━━━━━━━━━━━━━━━━━━━━ 0s 464us/step - loss: 12420.6582 - mae: 76.0137 - val_loss: 9181.8975 - val_mae: 67.0625
# Evaluate the regression model on the test set
test_loss, test_mae = regression_model.evaluate(X_test_scaled, y_test)
print(f"Test Mean Absolute Error: {test_mae:.4f}")
# Predict NOx concentrations on the validation set
y_val_pred = regression_model.predict(X_val_scaled)
# Plot true vs predicted NOx concentrations
plt.figure(figsize=(20, 6))
plt.plot(y_val.values, label='MLP Actual NOx(GT) Values (Validation)')
plt.plot(y_val_pred, label='MLP Estimated NOx(GT) Values (Validation)')
plt.title('MLP Validation Phase - Actual vs Estimated NOx(GT) Values')
plt.xlabel('Sample Index')
plt.ylabel('NOx(GT) Value')
plt.legend()
plt.show()
# Predict NOx concentrations on the test set
y_test_pred = regression_model.predict(X_test_scaled)
# Calculate RMSE and MAE of the test set
rmse = np.sqrt(mean_squared_error(y_test, y_test_pred))
mae = mean_absolute_error(y_test, y_test_pred)
# Number of samples
n_samples = len(y_test)
# Create a DataFrame to display the performace indexes table
regression_results_index_table = pd.DataFrame({
'RMSE': [f'{rmse:.4f}'],
'MAE': [f'{mae:.4f}'],
'#Samples': [n_samples]
})
print(regression_results_index_table)
fig, ax = plt.subplots(figsize=(4, 1))
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
ax.set_frame_on(False)
regression_index_table = ax.table(cellText=regression_results_index_table.values,
colLabels=regression_results_index_table.columns,
cellLoc='center',
loc='center')
plt.title('Regression Results Index Table', fontsize=14)
plt.show()
40/40 ━━━━━━━━━━━━━━━━━━━━ 0s 287us/step - loss: 5058.7817 - mae: 55.3928 Test Mean Absolute Error: 47.9600 40/40 ━━━━━━━━━━━━━━━━━━━━ 0s 544us/step
40/40 ━━━━━━━━━━━━━━━━━━━━ 0s 306us/step RMSE MAE #Samples 0 63.7866 47.9600 1255
Ref:
[1] Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hen Paul; Meester, Ludolf Erwin (2005). A Modern Introduction to Probability and Statistics. Springer Texts in Statistics. London: Springer London. doi:10.1007/1-84628-168-7. ISBN 978-1-85233-896-1.